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Abstract 

Relational models for contingency tables are generalizations of log-linear models, al¬ 
lowing effects associated with arbitrary subsets of cells in a possibly incomplete table, 
and not necessarily containing the overall effect. In this generality, the MLEs under 
Poisson and multinomial sampling are not always identical. This paper deals with the 
theory of maximum likelihood estimation in the case when there are observed zeros in 
the data. A unique MLE to such data is shown to always exist in the set of pointwise 
limits of sequences of distributions in the original model. This set is equal to the closure 
of the original model with respect to the Bregman information divergence. The same 
variant of iterative scaling may be used to compute the MLE in the original model and 
in its closure. 

Keywords: algebraic variety, Bregman divergence, contingency table, extended 
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1 Introduction 


The existence of maximum likelihoo d estimates unde r log-linear mode l s for contingency 


tables has been thorough ly studied, see iHabermanl 1974j . lAndersenI 1974j . iBarndorff-Nielsen 


1978j, iLauritzenl |1996| . among others. It was established that the maximum likelihood 


estimates of the cell parameters always exist if the observed table has only positive cell 
counts, and may exist if some of the observed counts are zero. The patterns of zero cells 
that l ead to the non-existence of the MLE were described in several forms [cf. IHabermanl. 
1974 . Fienberg and Rinaldol. 2012 


Within the extended log-linear model class all data sets have an MLE, irrespective of 
the pattern of zeros. An extended log-linear model may b e obtained as the closure of the 

or the closure 
or as the 


original model in the topology of pointwise convergen ce [cf. ILauritzenl. Il996 


with respect to the Kullbac k-Leible r dive rgence [cf. ICsiszar and MatiLT 
aggregate exponential family |Brow h 988l 


2003 
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The contribution of this paper is motivated by statistical problems in which models more 
general than log-linear need to be considered. To illustrate, suppose that the management 
of a large supermarket classihes all goods on stock into one of three mutually exclusive and 
exhaustive categories, say, food (F), non-food household (iV) and other (O), and wishes 
to study how the daily sales of each group are related. This is a standard task in market 
basket analysis [cf. Brin et ah . 1997j |. The hrst model of interest, routinely, is independence, 
but the usual model of independence of the three indicator variables is not applicable in 
this case: if pp, ppi and po denote the probabilities that a purchase (a basket) contains an 
item from the F, N and O groups, then the probability of an empty purchase would be 
(1 — Pf){1 — PAr)(l — Po), which has to be positive, in spite of the fact that there are no 
purchases which do not contain any items. 

O ne alternative indepen d ence concept to apply is the AS-independence of the three vari¬ 
ables [Aitchison and Silvev . 19601. The indicator variables F, N, and O are said to be 
AS-independent if 


Pfn — PfPn, Pfo — PfPo, Pno — PnPo, Pfno — PfPnPo- 


( 1 ) 


Relational models introduced by iKlimova. Rudas. and Dobra! 2012l | contain model ([T]) and 
many other models of association. 

A relational model on a contingency table is generated by a class of non-empty subsets 
of cells and can be specihed in the form: 


log5 = A'/3. 


( 2 ) 


Here, 6 denotes the vector of cell parameters, probabilities or intensities, and A is the 0-1 
ma trix whose rows are the indicators of g enerating subsets. A hierarchical log-linear model 
[cf. iBishop. Fienberg. and Hollandl . 119751 ] applies to a table which is a Cartesian product. 


and the model is generated by a collection of cylinder sets corresponding to marginals of 
the table and thus is a special case of a relational model. If the row space of A contains 
the vector 1 ' = (1,..., 1), as in the case of hierarchical log-linear models, then the model is 
said to include the overall effect. A model with the overall effect can be parameterized to 
include a common parameter in every cell, often called the normalizing constant. The models 
without the overall effect cannot be parameterized in such a way. The peculiar property of 
relational models without the overall effect is that models for probabilities (appropriate 
under multinomial sampling) and models for intensities (appropriate for Poisson sampling) 
are different and lead to different MLEs. Let y denote the observed frequency distribution. 
Then, when the overall effect is not present, the MLE for probabilities does not preserve the 
sufficient statistics A?/, and, for intensities, it does not preserve the observed total I'y, see 
Example 12.11 

An iterative scaling proce dure based on Bregman div ergence can be used to compute the 
MLE under relational models |Kliniova and Rudas . 2015 ]. The Bregman divergence between 
two distributions is a generalization of the Kullback-Leibler divergence, but, unlike the latter, 
stays non-negative whether or not the two distributions have the same total. This property 
is essential for relational models for intensities without the overall effect as these models may 
include distributions with different totals. 

If the observed frequencies are positive and the model matrix is of full row rank, the 
MLE under relational models can be computed using algorithms for convex optimization 
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or the Newton- 


[cf. Bertsekai. 1999 . Aitchison and Silvev . I960 . Eva.Ti,s and Forcina . 2013 
Raphson algorithm. A detailed discussion of the relati ve advantages and d i sadva ntages 


of variants of iterative proportional htting was given in Klimova and RudasI |2015 |. The 


contribution of the present paper is the investigation of cases when there are observed zero 
frequencies in the data, and of the closure of relational models under which such data will 
always admit an MLE. Of course, if only three groups of goods, as in the example above, 
are investigated, one cannot expect to see an observed zero, but if 1000 groups of goods are 
investigated, out of the resulting 2^°°° — 1 groups, many will be empty. As it turns out, the 
pattern of observed zeros has far reaching implications on the existence and kind of MLE 
obtained. 

A necessary and sufficient condition for the existence of the maximum likelihood estimates 
of the cell parameters under relational models is obtained in Section |2l The MLE for y exists 
if and only if there is a positive vector z such that Az = Ay. This is literally the same 
condition as the one that applies to log-linear models. 

In Section [31 extended relational models are studied. The extended relational model 
is dehned as the set of distributions parameterized by the elements of an algebraic variety 
associated with the model matrix of the original relational model. It is shown that this set 
is equal to the closure of the original model with respect to both the pointwise convergence 
and the Bregman divergence. 

In Section 01 a polyhedral condition for the existence of the MLE in the original or 
the extended relational model is formulated. If the vector of the sufficient statistics, A?/, 
of the observed distribution is not contained in any of the faces of the polyhedral cone 
associated with the model matrix, the MLE exists in the original model, and otherwise, 
it does in the extended model. This condition is the same as for the log-linear case, but 
the proof is very different. The multiplicative representation of the distributions in the 
extended model and the existence of the MLEs of the model parameters are also discussed 
in this section. Fina l ly, th e generalized iterative proportional htting procedure suggested in 
Klimova and Rudas 20151 is extended to the case of observed zeros. 


While the conditions of the existence of the MLE in the generality considered in this paper 
may be formulated to coincide with the known conditions for the case of log-linear models, 
the proofs turn out to be more involved. Also, the algorithm to obtain that the MLEs is 
more complex. The additional cor nplications come from properties of the MLE when the 
overall effect is not present. In fact. lLauritzenI 19961. p.75] mentioned the existence of models 
without the overall effect, which he called the “constant function”, but to avoid difhculties 
did not cons ider them. On the other hand, such models ha ve been used in practice, see 


references in Klimova et ah 2012 , Klimova and Rudas 2015 


2 MLE under relational models 

Let Ki,..., Yk be discrete random variables with hnite ranges, and the vector X of length |X| 
be their joint sample space. Here, X may also be a proper subset of the Cartesian product 
of the ranges of the variables. A distribution on X is parameterized by the cell parameters 
6 = {(5j, for i G X}, and, to simplify notation, is identihed with 5. The components of 5 are 
either probabilities: 6i = pi E (0,1), with YliexPi ~ 1’ intensities: Si = Xi > 0, for all 
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i G X. Let V denote the set of positive distributions, 6 > 0, on X. 

Let A be a 0-1 matrix of size J x |X|, which is interpreted as the indicator matrix of 
J subsets generating the model. Assume that A has no zero column. A relational model 
RMs{A) is the following set of distributions: 

J 

RMs{A) = {SeV: <5* = n ^ ^ ^ ^>o}’ (3) 

j=i 

where 6 = ( 6 *i,..., 6j) G M>o denotes the vector of parameters associated with the generating 
subsets. Under the model, the cell parameters are equal to the products of the parameters 
6 corresponding to the subsets to which the cell belongs. In the sequel, the components of 
6 are referred to as the multiplicative parameters, and A is assumed to be of full row rank. 
In fact, the model RMs{A) is uniquely determined by the row space of its model matrix, 
TZ{A). Relational models for which V G 7^(A) are said to include the overall effect. 

A dual representation of a relational model RMs{A) can be obtained using the kernel 
basis matrix D, whose rows, di,..., dx, are a basis of Ker{A). In this representation, any 
distribution in the model satishes 

Dlog 5 = 0, (4) 

which can be re-written using the generalized odds ratios: 

= 1, • • ■ = 1, (5) 


or using the cross-product differences: 

s^t _ = 0 , 5*^2 


52 = 0, 


54 _ 54 = 0 , 


( 6 ) 


where, and d denote, respectively, the positive and negative parts of a vector d 


Klimova et ahl. 12012 


The properties of the maximum likelihood estimates under relational models are reviewed 
next. Let Y = (Yi,..., Yk) be a random variable that has a multivariate Poisson distribution 
parameterized by 5 = A or a multinomial distribution parameterized by N and 6 = p. Let 
^ be a realization of Y, and 


q = 


y, if 5 = A, 

y/{l'y), ii6 = p. 


( 7 ) 


If the MLE 5y of the cell parameters under the model RMs{A) exists, it is the unique 
solution to the system of equations: 


A5 = 'yAq, 

Dlog 5 = 0, ( 8 ) 

1'5 = 1 (only for 6 = p). 

The value of 7 is called the adjustment factor. If RMs{A) is a model for probabilities with 
the overall effect or a model for intensities, then 7 = 1 for every y. If RM/^ jA) is a model 
for p robabilities without the overall effect, then the value of 7 depends on y [Klimova et ah . 
20121 . 
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Table 1: Maximum likelihood estimates under the model of AS-independence of variables F, 
N, O, under the multinomial and Poisson sampling. 




0 = 

No 

0 = 

Yes 




N = No 

N = Yes 

N = No 

N = Yes 


F = 

No 

empty 

14 

25 

16 

- observed 



empty 

27.33 

32.60 

8.91 

- multinomial 



empty 

3.31 

7.29 

24.13 

- Poisson 

F = 

Yes 

10 

5 

3 

27 

- observed 



18.46 

5.04 

6.02 

1.64 

- multinomial 



1.26 

4.17 

9.18 

30.39 

- Poisson 


Example 2.1. The model of AS-independence ([T]) is a relational model generated by the 
model matrix 

10 0 11 
A= ( 0 1 0 1 0 
0 0 10 1 



( 9 ) 


where the order of cells is lexicographic. As V is not in the row space of A, the model 
does not have the overall effect. Thus, the models RM\{A) and RMp{A) are not equiv¬ 
alent. Given hypothetical data, the MLE for cell frequencies, computed under the model 
for probabilities and under the model for intensities, are shown in Table [H In the case of 
probabilities, the estimates for sufficient statistics are about 0.7 times less than the observed 
sufficient statistics. In the case of intensities, the estimated total is approximately 79.73, 
while the observed total is 100. The estimates were obtained using the R-package gIPFrm 
Klimova and Rudasl. l2014| . □ 


A necessary and sufficient condition for the existence of the MLE is given in the next 
theorem. Its proof uses the following lemma: 

Lemma 2.1. Ify>0, the MLE 5y exists. 


Proof. A relational model for intens ities is a regular exponential family iKlimova et al.l. [20121 
and the standard proof applies [cf. Andersen, 1974l |. 


In the case of p r obabi lities, S = p, the MLE, if exists, is the unique solution to (jH]). 
Klimova and Rudaa |2015 . Lemma 3.5] showed that there exist 71,72 > 0 such that the 


adjustment factor 7 G [ 71 , 72 ]- Since 'yy > 0, the MLE under the model for inten sities 


RMx{A) exists for every 7 e [ 71 , 72 ], and, by Lemma 3.6 in lKlimova and Rudasl |2015 


Because Xy*y satishes 


Pv = ^ 


•Tv 


can End a unique 7 * such that l'X-y*y = 1 

As shown next, the MLE may exists when some of the observed frequencies are zero. 


one 

□ 


Example 12.11 (revisited): 
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Let q = (0, 0, 0, 0, 0, 0,1)' be the observed distribution. Under the model of AS-independence, 
the MLEs for cell probabilities exist and are equal to 

□ 


Theorem 2.2. Let y he the vector of observed frequencies under Poisson or multinomial 
sampling, and let RMs{A) be a relational model. The MLE 5y under the model exists if and 
only if there is a positive vector z, such that Az = Aq, with q defined in &■ 



d G Ker{A). Take 

1 , ^ 

z = = q-\ — d > 0. 

7 7 


Then, as ^d G Ker{A), Az = Aq, as required. 

To prove the converse, assume that there exists a z > 0, such that Az = Aq. Thus, 
z = q + d ioT some d G Ker{A). Let 


and note that 1 + I'd = I'q + I'd = I'z > 0. Next, consider n = (1 — l'di)q + di. Then 
I'v = (1 — I'di) + I'di = 1 , and 


n = (1 - l'di)q + di = 


d = 


1 + I'd^ l + l'd 1 + I'd 




Therefore, n is a positive probability distribution, and, by Lemma 12.11 the 
and satishes: 


> 0 . 

MLE p.„ exists. 


= 'y^Av, 

Dlog P„ = 0, 

I'P. = 1, 

for some 7 „ > 0. Then, from the dehnition of v, Ap.^, = 'jvAv = 7„(1 — l'di)Aq, that is, 
p.„ is also the MLE for q with the adjustment factor 7 = 7„(1 — I'cZi). □ 

The statement of the theorem is illustrated in the next example. 

Example 2.2. Let RMp{A) be the model for probabilities generated by 

/ 1 1 1 0 1 \ 

A = 1 1 0 0 1 , 

\ 1 0 0 1 1 / 
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and let q = (3/7, 3/7, 0,1/7, 0)' be the observed probability distribution. Consider any vector 
z, whose subset sums, Az, are equal to the observed subset sums: 


Zi + Z 2 + Z 3 + Z 5 — 6/7, Zi -\- Z 2 -\- — QjT 1 Zi + Z 4 + Z 5 — 4/7. 

The hrst two equations imply that Z 3 = 0. Therefore, there is no (strictly) positive distri¬ 
bution with the same subset sums as those observed, and thus, q does not have an MLE in 
the model. □ 

In the next section, an extended relational model is dehned as the polynomial variety 
corresponding to the model matrix. It is further shown that the extended model coincides 
with the set of pointwise limits of sequences of distributions in the original model and is also 
the closure with respect to Bregman information divergence. 


3 Extended relational models 


Let A be the mode l matrix o f a rela tional model, and let denote the polynomial variety 


associated with A [Sturmfelsl . 11996 


Xp. — < (5 G 


^>0 




\ld G Ker 


(A)}. 


( 10 ) 


Definition 3.1. The extended relational model for intensities, i?MA(A), is the set of distri¬ 
butions 

AgAa. (11) 

The extended relational model for probabilities, RMp{A), is the set of distributions 


p G Aa n A|j|_i, (12) 

where A|j|_i is the (|X| — l)-dimensional simplex. □ 

For positive distributions being in Aa is equivalent to the representations (jl]), ([5]), and 
(j6|). Therefore, the relational model generated by A is a subset of the corresponding extended 
model. For a positive S, whether or not dl]), ([5]), and ([6]) hold does not depend on the choice 
of D. However, as illustrated next, there exist <5 > 0, which, due to the pattern of zeros, 
satisfy dH]) for some choice of D and do not satisfy for another. 


Example 12.11 (revisited): The model has dual representations using matrices Di and D 2 : 


/ 1 

1 

0 

-1 

0 

0 

0 \ 


/ 0 

0 

1 

1 

0 

0 

-1 \ 

1 

0 

1 

0 

-1 

0 

0 

D2 = 

0 

1 

0 

0 

1 

0 

-1 

0 

1 

1 

0 

0 

-1 

0 

1 

0 

0 

0 

0 

1 

-1 

Vi 

1 

1 

0 

0 

0 

-1 


U 

1 

1 

0 

0 

0 

-1/ 


The distribution <5 = (0, 0, 0,1,1,1, 0)' satishes ([S]) if obtained from D 2 , but does not satisfy 
dH]) if obtained using Di, and therefore, 6 ^ Xa- □ 
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The support supp{d) = {i E X : > 0} of distributions with zero components which 

are in Aa can be characterized using the concept of a facial set which is dehned next. 

Let ai,..., a\x\ denote the columns of A, and let Ca be the set of all non-negative linear 
combinations of these columns: 


Ca — {t E 


^>0 


36 E 


^>0 


t = A5}. 


The relative interior of Ca, relintiCA), comprises such t 
(strictly) positive 6 that satishes t = A 6 . 




(13) 

for which there exists a 


The set Ca is a polyhedral cone in If an affinely independent set a 


aij of 

col umns of A spans a proper face o f (7 a, t he set of indices F = {ii, * 2 , • • •, */} is called facial 


'll, Otjj, 


[cf. iGriinbauml. l2003l. iGeiger et ahl. 12006 


space [cf. iFienberg and Rinaldol. 12012 
of Ca- In that case, there is a facial set F = F{t), such that 


The facial sets of A are determined by its row 
If t G Ca \ relintiCA), then t is said to he on a face 


f H-(14) 

Equivalently, a set F is facial if and only if there exists a c G such that c'ai = 0 for 
every i E F and c'ai > 0 for every i ^ F. The properties of facial sets are formulated in 
Lemma lA.ll given in the Appendix. In particular, only distributions whose support is X or 
a facial set of A may belong to Xa- As an example, the facial sets of the model matrix (jS]) 
of AS-independence are {1}, { 2 }, {3}, {1,2,4}, {2,3,6}, {1,3,5}. The support {4,5,6} of 
6 = (0, 0, 0, 1,1,1, 0)' from Example 12.11 is not a facial set, and thus 6 cannot be an element 
of Aa. 

The following theorem describes the structure of the parameter set of the extended rela¬ 
tional model. 

Theorem 3.1. The extended relational model RMs {A) is the closure of the relational model 
RMs{A) in the topology of pointwise convergence: RMs{A) = cl{RMs{A)). 

The proof is provided in the Appendix. The theorem says that every distribution in the 
extended model can be obtained as a pointwise limit of a sequence of distributions in the non- 
extended model. In the following example, such a sequence is found using the construction 
described in the proof. 


Example 12.11 (revisited): 

The set F = {2, 3, 6 } is facial set of A, and thus, by Lemma lA.ll the extended model 
contains a distribution p = { 0 ,p 2 ,P 3 , 0, 0,Pq, 0)', where P 2 ,P 3 ,Pg > 0 and p 2 +P 3 + Pg = 1- To 
construct a sequence of distributions in the original model which converges to p, find 62 , O 3 
such that 

6*2 = P2, O 3 = P3, 0203 = P6- 
From the normalization condition. 

Take an arbitrary 6*1 G (0,1), then set 

(„) _ 1 - 6 'in"^ -O 3 - 

2 “ 1 + + 63 + 0in-i03 ’ 
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and consider 


i(n) 


= 03 , 0 ?^ 03 , 


For every n, 
construction is complete. 


e RMp{A). As n —)> oo, 9^^ —)■ 02, and therefore, -> p. The 


□ 


An extended relational model can also be dehned as a closure of the exponential family 
corresponding to the original model. The closure of expo nential familie s using the Kullback- 
Leibler divergence was described for regula r families by Brownl |l988| . among others, and 
for full families by Csiszar and Math! |2003| . However, both of these approaches rely on the 
presence of the overall effect, which implies, through the possibility of norm alization, that 


the Kullback-Leibler divergence is non-negative and Pinsker’s inequality [cf. ICsiszarl . Il975 


holds. In the generality considered in the present paper, the approach does not apply, and 
the Bregman divergence is used to dehne the closure. 

ITI 

Let T*(-||-) denote the Bregman divergence between two vectors f, it G M^q, associated 
with the function f{x) = x{i)\ 

D(t\\u) = ^((ijlog (t{i)lu(i)) + 


(15) 


iei 


iei 


iex 


Under the convention 0 ■ logO = 0, D{t\\u) is also dehned for non-negative t and u if 

supp{t) C supp{u). The function D{t\\u) is non-negative, and D{t\\u) = 0 if and only if 

\x\ \x\ \x\ 

t = u. For any u* G M!>q and for any convex set S C M!>q there exists a unique u* G M>q, 

such that 

(16) 


D{u*\\u) = mmD{z\\u), 

zGS 


see 


BregmanI [l967l |. This u* is called the D-projection, or the Bregman projection, of u 


on S. If Pi and P 2 are probability distributions, then D{pi\\p 2 ) is the Kullback-Leibler 
divergen^;^ 

Let RMs{A) be the closure of RMs{A) with respect to the Bregman divergence: 


RMs{A) = G P : G RMs{A),n G K, such that —)■ 0 as n —)■ cxo|. 


Theorem 3.2. The closures of the relational model RMs{A) according to the pointwise 
convergence and to the Bregman divergence coincide. 

Proof. Let < 5 * gRMs{A). Then, there exists a sequence G RMs{A) such that — )■ < 5 * 

pointwise, as n —)■ oo. The function is dehned and continuous for > 0, even 

if some of the components of <5* are zero. Therefore, i3(5*| ^ 0, as n —?■ oo. 

Suppose < 5 * G RMs{A), and, thus, there exists a sequence 6^"“^ G RMs{A), such that: 

—)■ 0 as n —)> oo. 


Therefore, < 1 for all large enough n. Because the set {<5 > 0 : i3(5*||(5) < 1} is 

compact in [BregmanI . Il967 |. there exists a subsequence that converges pointwise 
to (5*, as /c —)■ oo. □ 
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A relational model RMs{A) is a multiplicative family of distributions; the conditions 
under which the extended model RMs{A) is also a multiplicative family are studied next. 

A distribution S E V is said to factor according to a matrix A if it has a representation 
given in ([2]), with 6 = {6i,... ,6jy > 0. Every distribution in a relational model factors 
according to the model matrix. However, as the next example demonstrates, an extended 
model may contain distributions which do not factor according to one choice of the model 
matrix but do factor according to a different choice. 


Example 12.21 (revisited): Any distribution in RMp{A) factors according to A, that is, 

P = ^ 1 ^ 2 , 6 ^ 1 , 03, 0l0203)^ (17) 

for some 01 , 02,03 > 0. The non-negative distribution pg = (1/8,1/2, 0,1/4,1/8)' does not 
have the multiplicative structure flTTI) . but is in the extended model. To show the latter, 
take 


0i") = 


nin) _ ^ 

3n + 4’ ^ 2’ 


Then, the sequence 


pin) = 


3n 


3n 


= -, n > 1. 

3 — 


3n 


8(3n + 4) ’ 2(3n + 4) ’ 3n + 4’ 4’ 8(3n + 4) 


is in the model, and hm„_,.oo = Pg- On the other hand, Pg factors according to the 
matrix 

0 0 10 0 
Ai = ( 1 1 0 0 1 

10 0 11 


which generates the same extended model as A does, because Ker{A) = Ker{Ai). 


□ 


A necessary and sufficient condition of the existence of such a factorization for a distri¬ 
bution in an extended relational model is given next. 

Theorem 3.3. A distribution 6 G RMs{A) factors according to A if and only if for any io ^ 
supp{S) there exists an index j = j(io) ^ {1, • • •, <7} such that aji = 0 for all i G supp{6). □ 

The condition of the theorem, called the A-feasibility of supp{5), means that a generating 



apply here. 


Maximum likelihood estimation in the extended relational model is studied next. 


4 MLE in the extended model 


Let F be a facial set, and let Ai;’ denote the sub-matrix of A comprising the columns with 
indices in F, and Sp denote the sub-vector of d with indices in F. The following result 


extends Theorem 9 in iFienberg and Rinaldol 2012 
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Theorem 4.1. Let y be the vector of observed frequencies under Poisson or multinomial 
sampling, and let RMs{A) be a relational model. Consider q defined in 0 , and assume 
that supp{q) C X. 

(i) If for all facial sets F, supp{q) ^ F, then the MLE Sy under the model RMs{A) exists, 
and is also the MLE under RMs{A): 6y = 6y. Otherwise, 

(a) Let F be the smallest facial set such that supp{q) C F. Then the MLE Sy p of 6p 
under the model RMsp{Ap) exists, and Sy = {Sy^p,Ox\p) is the MLE under the model 
RMsiA). 

(Hi) The MLE Sy under RMs{A) always exists and is the unique point of Xa which satisfies: 

Ad = ^Aq, for some 7 > 0 ; (18) 

l'(5 = 1 {only for S = p). 


The vector 6y is called the extended MLE of S under the relational model. The proof is 
given in the Appendix. The following example illustrates the theorem. 


Example 12.21 (revisited): 

Notice hrst that F = {1,2,4,5} is a facial set of A. The support of the observed 
distribution supp{q) = {1, 2,4} is a subset of F. Therefore, the MLE of q exists in the closure 
of the relational model. As it was shown earlier, the distribution p^ = ( 1 / 8 , 1 / 2 , 0 ,1/4, 1 / 8 )' 
is in RMp{ A). As Ap^ = 7/8Aq, the extended MLE of g is Pg- ^ 


The next theorem establishes a condition under which the maximum likelihood estimates 
of the model parameters under an extended relational model exist: 

Theorem 4.2. Assume that the MLE 5 under the extended relational model RMs{A) exists. 
The maximum likelihood estimates of the model parameters 0 exist if and only if supp{5) is 
A- feasible. 


Proof. By Theorem 13.31 the distribution 5 factors according to A if and only if supp{S) is 
A-feasible. In this case S{i) = n/=i i E T, and, by uniqueness, 0 = {9i ,..., Oj)' 

are the maximum likelihood estimates of the model parameters. □ 


If supp{6) is not A-feasible, then 5 is the limit of a sequence of the positive distributions 
in the model which factor according to A. Although the cell parameters of these distributions 
can be factored using some model parameters > 0, the limits of individual components 
of 0 ^ as n —?• 0, may not exist. In the case of the log-linear models this fact was illustrated 


by iRinaldol 2006| . The same situation occurs in the construction of Example 12.21 where 
6 * 2 ”^ 00 as n ^ 00 . 

As Theorem 14.11 implies, the MLE i n the extended relat i onal model can be obtained 
using the MLE in a non-extended model. 


Klimova and Rudas 2015 


proposed a generalized 

iterative scaling procedure, called G-IPF, for computing the MLE under (non-extended) 
relational models. The algorithm relies on the condition that Aq > 0. Every iteration of 
this procedure implements the following algorithm, IPF( 7 ), for a specihc value of 7 . 


11 

















IPF(7) Algorithm: 


Set n = 0; ^^\i) = 1 for all i G X, and proceed as follows. 

Step 1: Find j G {1, 2,..., J}, such that n + 1 = j mod J; 

Step 2 ; Compute 

A a V'' 

—j for all i G X. 



( 19 ) 


Step 3; While ^Ajq ^ for at least one j, set n = n + 1, go to Step 1. 

Step 4: Set (5* = and finish. □ 


The G-IPF algorithm commences with executing IFF( 7 ) for 7 = 1 , which is sufficient 
to compute the MLE in the case of probabilities with the overall effect and in the case of 
intensities. If in the case of probabilities the overall effect is not present, G-IPF updates 
7 and calls IPF( 7 ) again. The procedure is repeated until, for some 7 , the limit vector (5* 
sums to 1, and thus is a parameter of a non-negative probability distribution. The variant 
of G-IPF, which employs the bisection method to update 7 , is described in the following. 


G-IPF Algorithm: 


If (5 = A, compute A using IPF(l), and finish. 

If (5 = p, compute p* using IPF(l). 

If Ip* = 1, set p = p*, and finish. Otherwise, 

compute 7 l = (l'Ag)-\ 7 ij = min { 1 /Aiq,..., 1 /Ajqr}, and proceed as follows: 
Step 1 ; Find using IPF( 7 ). 


Step 


2 : While 7 ^ 1) 

if set 

else set 7 ij = 
go to Step 1. 


IL 


— IL+IR 
2 ’ 


Step 3: Set p = ^{■y^+'yR)/2^ und finish. 


□ 


If Aq > 0, the G-IPF algorithm applies to the extended case directly. 

Theorem 4.3. Let y he the vector of observed frequencies under Poisson or multinomial 
sampling, with q defined in 0, and let R]\L^(^A.^ be a relational model. Assume that A.q 7 0. 
The G-IPF algorithm converges to the MLE 6y under RMs{A). 
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Proof. As Aq > 0, the IPF-sequen c e defined in ffT^ is positive, and the proof of its 


convergence in iKlimova and RudasI 20151. Theorem 3.2] applies. In particular, the limit 
of the sequence, (5*, satishes AS* = qAq, and, for an arbitrary kernel basis matrix D, 
Dlog = 0 for all n G Z>o. The latter implies that (5^”^ G Aa for all n, and, as <Ta is a 
closed set in M>q, (5* G 

Let (5^ be the limit vector obtained from IPF(l), and thus G Aa and A(5]i = Aq. 

Suppose 5 = A. Then, as fflSj) holds for <5]] with 7 = 1, Theorem 14. If mi implies that <5]] 
is equal to the extended MLE: = <5]]. 

Suppose 8 = p. First, assume that the overall effect is present, and thus there exists 
a fc G M>o, such that V = k' A. The latter yields that = k' A8\ = k' Aq = Vq = 1. 
Therefore, flTSl) holds for (5^ with 7 = 1 . By Theorem 14. If mi. Sy = (5*. 

Now, assume that the overall effect is not present. In this situation, G-IPF updates 7 
and calls IPF( 7 ); and this procedure is repeated until a 7 * for which the IPF-limit <5*, sums 
to 1 is found. Then, <5** satishes (ITSD with 7 = 7 *. By Theorem 14. If mi. Sy = 5]]. □ 


Next, it is shown how G-IPF can be used if the condition Aq > 0 does not hold. Let 
Jo = {j G {1,..., J} : Ajq = 0}, and assume that j7o 7 ^ 0- Further, let Xq = G X : G 

Jo o-ji = 1}) cind let X* = X \ Xq. Denote by A* the matrix obtained from A by removing 
the columns with indices in Xq and by removing the zero rows, if such occur afterwards, and 
by (5*, and q* the cor responding sub-vectors of 8, y, and q. By Theorem Id-lf ud. the 
MLE 8y^ of under RMs.,{A^,) exists and is unique. Since A*q^ > 0, 8y^ can be computed 
using G-IPF, see Theorem 14.31 and the following holds: 


Theorem 4.4. The MLE of y under RMs{A) is equal to Sy = {Sy Oxo)- 


Proof. In order to show that Sy G Aa, it will hrst be verihed that X* is a facial set of A. Let 
be the Tth column of A, then, with c = {0 j\Jq, 1 j^)', c'ai = 0 for any i G X*. If i ^ X*, 
then Oji = 1 for some j E Jq, and thus > 0. Therefore, X* is a facial set of A. Then, by 
Lemma [A.31 Sy G Aa- 

Next, in the case of probabilities, the normalization condition = 1 implies that 

I'Sy = 1. Further, A^Sy^ = jA^q^ implies that ASy = yAq. 

Finally, by Theorem 14.11 mi. Sy is the MLE of y under RMs{A). □ 


5 Conclusion 

Some research areas deal with populations of a complex structure to which inference based 
on the standard log-linear approach does not apply, but the relational model framework 
can be used. The relational models are more flexible as they allow effects associated with 
arbitrary subsets of cells, can be used for incomplete tables, and do not require the presence 
of an overall effect. Similarly to the log-linear case, data with zero counts may not possess 
an MLE under a relational model. A necessary and sufficient condition for the existence of 
the MLE was obtained in Section [2l When this condition does not hold, an MLE may exist 
in the extended sense, that is, in the closure of the relational model. Different but equivalent 
ways of dehning such a closure, and a necessary and sufficient condition for the existence of 
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the extended MLE in it were presented in Section [3l A condition under which a distribution 
in the closure factorizes according to the model matrix was also given. These results were 
obtained using concepts and methods of algebraic statistics. Just like in the case of relational 
models, the cases of multinomial and Poisson sampling are not equivalent. It was shown in 
Section 01 that the generalized relative proportional htting procedure originally suggested 
for relational models also works when the data contain zeros and the MLE is sought for in 
the closure of a relational model. 


A Appendix 


A.l Properties of facial sets 


Lemma A.l. Let A be the model matrix of a relational model, and let F he a facial set of 
A. Then: 

(i) There exists a c G such that dai = 0 for any i E F and c'ai > 0 for any i ^ F. 

(a) For any d G Ker{A), either both supp{dd) C F and supp{d~) F F or both supp{d~^) ^ 
F and supp{d~) ^ F. 

(Hi) For any d G either supp{S) =X or supp{8) is a facial set of A. 


(iv) If F is a facial set of A, there exists a d E such that supp{S) = F. 


Th e statements of the lemma were proved bv iGeiger et ahl j2006| a.nd lRanh. Kah1e. and Av 
201l| for models of type (E]) when the overall effect is present. Their proofs do not rely on 


the latter characteristic and thus apply here. 

The next lemma shows that the condition of existence of the MLE given in Theorem I2.2I 
can also be formulated in terms of facial sets. 


Lemma A. 2. There exists a z > 0, such that Az = Aq, if and only if supp{q) is not 
contained in any facial set of A. 

Proof. Suppose there exists a z > 0, such that Az = Aq, and thus d = z — q E Ker{A) 
and q + d > 0. 

Let F be a facial set of A. If both d'^ F and d~ C F, then di = 0 for all i ^ F. 
Because q + d > 0, qi + di = qt > 0 for al\ i ^ F. Therefore, supp{q) is not contained in 
F. Otherwise, see Lemma lA.ll both d’*' ^ F and d~ F, and there exists an i ^ F such 
that di < 0. If qi was zero, then qi + di would be negative, which contradicts the initial 
assumption q + d > Q. Therefore, qi has to be positive, which implies that supp{q) is not 
contained in F. 

To prove the converse, assume that supp{q) is not contained in any facial set F. Sup¬ 
pose the equation Aq = Az has no (strictly) positive solution in z, and, therefore, Aq ^ 
relintiCpf)- A non-negative solution always exists, and thus Aq belongs to a face of Fa- 
Then (ITT)) holds for t = Aq for some facial set F; without loss of generality, F = 

Aq = SiOi + Sfttf. 
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Hence, 


{qi - si)ai H- {qf - s/)a/ + g/+ia/+i H-h q\i\o.\i\ = 0. (20) 

Multiplying both sides of fl20|) by a vector c, such that dai = 0 for i e F and c'aj > 0 for 
i ^ F, leads to: 

Qf+i = 0 , . . ., q\x\ = 0 , 

which means that supp{q) C F. This contradicts the initial assumption that supp{q) is not 
contained in any facial set. □ 

The following lemma is used in the proofs of Theorems 14.11 and 14.41 

Lemma A. 3. If F is a facial set of A, then, for any Sp G Xaf, <5 = {Sf,0x\f) £ 

Proof. Take an arbitrary d G Ker{A). As F is a facial set of A, by Lemma [A.llliI|) . exactly 
one of the following holds: 

supp{d'^) C F and supp{d~) C F, or supp{d'^) ^ F and supp{d~) ^ F. 

In the hrst case, there exists a. dp E Ker^Ap), such that d = {dF,0x\F). Since dp G 
{6f)^f = {5f)^f, and, therefore, 

= {6p)'^- ■ = {6f^. 

In the second case, there exist such ii,i 2 ^ F that djj > 0 and di.^ < 0; thus, 

= (,5^)4 .0 = (Sp)'^^ • 0 = {SfF 

As = ((5)'^ for any d G Fer(A), S G □ 


A.2 Proof of Theorem 13.1 


The proof extends the arguments given bv iGeiger et ahl [2006| and iRauh et al.l j201l|. It will 


be shown hrst that for any distribution in RMs{A) there exists a sequence of distributions 
in RMs{A) t hat converges to it pointwise. 

Let S* G RMs{A). By Lemma fA.il as S* G Aa, F = supp{d*) is either X or a facial set 
of A. If F = X, then 6* > 0, and the statement holds with = 6*. Assume that F C X. 
For simplicity of exposition, let F = {1,..., /}, and then S* = (5*,..., 0,..., 0). 

First, hnd pi,... ,pj > 0 that satisfy: 


n 

i=i 


vT" ~ 4 i ^ F. 


The existence of such 6*’s can be proved using the same argument as iGeieer et al.l |2006 
p.28] gave for the case of extended log-linear models. By Lemma lA.lf there exists a c = 


(ci,..., cj)' G such that da^ = 0 for alH G F and da^ > 0 for any i ^ F. Order the 
columns of A so that ci > 0, and then order the rows of A so that an = 1. 
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If 5 = A, set, for n G Z>o, 


a|"> = * e I- 

i=i 

The distribution A*^”^ = (Ai"'\ ..., A|^|^)' is positive and satisfies ([3]) with 9j = There¬ 

fore, A^"^ G RMx{A). Further, 

lim A-"^ = lim IT rj', 

n—^oo n^oo XX ^ i ^ -r ■ u m 

j=i 0, li z ^ F, 

thus A*-"^ —j- d* pointwise, as u —)■ cxo. 

If 5 = p, take 

(n) _ ^ ~ X]i:aii =0 11 ^= 2 ^Vj) 


j I 5*, iiieF, 

dji 

' . = 


hi = 


Ei:aH=in/=2(^ 


and set 




i=2 


The choice of implies that = 1. As = {p^^\ ■ ■ ■ ,p\x\y is positive and satishes 

(|3]) with 6*1 = 6j = n~'^^pj, for j = 2,..., J, p(”i G RMp{A). Next, because c'ai = 0 if 
i G F, 

- Ea,=0.eFn-=2h?‘ - n/=2hr) 


lim = lim , 

™ n^oo n-l(E„,,=i,,6i.n,=2h/' +EaH=l,*^F^""'“”n;=2h 


•' n., 'nJ 

j > 


J ^ji 


i- EiGF:aii=0nj=2hj 


5^iGF:aii=l 1^1=2 hj 
Further, for i eT, using fl^ . 


/ aji hi- 


( 21 ) 


lim pl' = lim n° 

n—>-oo n—^oo 


‘(hi ) TT hi = 1™ n ) TT i;,- 

j=2 3=2 

5t lEF, 


J .1 

= lim “'( 771 )““ IT 77 “-’“ = lim 77 “*^ IT 77 ^ = 

n—)-oo n—>-oo X X >/ i ^ ■ j . -r^ 

j=2 j=i [ 0 i^F. 

Hence, p^"i — p ointwise, as 77 —)■ cx). 

Therefore, RMs{A) C cl{RMs{A)). 

To prove the converse, choose a <5* G cl{RMs{A)). Then, 5* is a pointwise limit of a 
sequence of distributions in RMs{A), and 5* is the pointw ise limit of a sequen ce in Aa- As 
Aa is closed in the topology of pointwise convergence [cf. iGeieer et all . l2006l |. <5* G Aa- If 
5 = p, both <5* and the sequence converging to it belong to the simplex A|x|_i. Therefore, 
6* G RMs{A), and the proof is complete. □ 
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A.3 Proof of Theorem 14.1 


The statement (z) follows from Theorem 12.21 and Lemma [A.21 □ 

In order to prove (m), notice hrst that, the smallest facial set F of A which contains supp{q) is 
uniquely dehned. In this case, ApQp G relint{CAp), and, therefore, supp{q) is not contained 
in any facial set of Ap. By part (z) of this theorem, the MLE under RMsp{Ap) exists. 

Let 6y = {Sy^,Ox\F)- By Lemma fA-Sl dy G Aa- If <5 = p, ^'Py^ = 1, and thus Py 
satishes the normalization condition I'p^ = 1. It will be shown next that dy maximizes the 
full log-likelihood of y. 

Let 6 = X. The log-likelihood under the model RMxp{Ap) is equal to 

Ipiqp, Xp) = qpdog \pi — \pi, 

ieF i£F 


and for any Xp > 0, h^qp: ^f) < hiqp, \p)- 

Let A = (A^, 0)', and let A^"'^ be the sequence that was described in the proof of Theorem 
o The full log-likelihood of the elements of this sequence is 


l{q, A(^)) = ^ 

iex iex ieP zex 

J J 

= ,.log {n-=“- n 9p ] - n «;« 

ieP j=i iex j=i 

= E (H } - E n - E n 


ieP 


i=i 


iGP j=l 
J 

ir(,y,AF)-E""“‘n'’r' 

i^F j=l 


itF 3=1 


Therefore, 

/(q, A^”^) < Ip^qp.Xp) < lF{qp,Xy^). 

Let 6 = p. The log-likelihood under the model RMp^^Ap) is equal to 


( 22 ) 


/ 

h^qp^Pp) = '^qpihg pFi, 

i=l 


and for any pp > 0, such that = 1, Ip^qp-iPp) < ^F{qp,Pyp)- 

Let p = {p'p, 0)', and let be the sequence that was described in the proof of Theorem 
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13.11 The full log-likelihood of the elements of this sequence is 


Kq.p 


_ 


iex ieF 

J 

^gdog {(^f 

F j=2 ieF 

gdog Yl + Y n 


ieF 


(’T’l^aii^aiiCi-c' 


1=2 


2 E-F: aii=l 


1=2 


i^F:ai\=0 j=2 

J J 

Y 5*iogn^r+ Y ^^log 

iGF:aii=l j=l iGF:aii=0 j=l i£F:aii=l 

= hiQF^PF){Oi/ie^^^n^^)} ■ Y (ii- 

iGF: an =1 

It will be shown next that 61 /{ 6 ^^n'^^) > 1. 


0 i 


1 SiGF:aii =0 nj =2 


J r)^ji 
j 




E -r-r-V ^aji 

i(zF: aii=l llj=2 

"“‘(E.„.M,Fn'=2 9?’ + E„„.. nLd") 


«“‘(1 - E„„.Cl,i6Fnh2^?' - E„„=0,i^F"“''“‘ rijEj#; 


.7 

j ' 


= 1 + 


'Faii=l,i^F n 117=2 


J /n“ji 
1=2 ^1 


/ I 


X]aii= 0 ,i^F n Ilj =2 


./ /(“ii 

1 


Therefore, 

Combining 

and 


and 




E y-rJ nCLji j ' I 1^7 

aii = l 1 lj=2 / \ 2-^i^F: aii=Q Wj=2^j 

l{q,p^^^) < If^Qf^Pf) < hiqF^PyJ- 
l{q, < hiqpj ^f) < ^^f)’ 


sup/(qr,5(^^) < lFiqF,KF)- 

n 

d")^ 


> 1 . 


(23) 

(24) 


Hence, whenever <5 —1 5 as n —)■ cxo, /(q, <5 ) —1 iF^qF^ ^yp)- 

Therefore, l{q,Sy) = supl{q,S) = lF{qFi ^yp)-i which concludes the proof of (m). □ 

The uniqueness claim in [iii) follo ws from the conv exity of the log-likelihood function. The 
proof is similar to the one given bv iLauritzenI 19961 Proposition 4.7] for the case of extended 
log-affine models, and is thus omitted. In order to prove the second claim, suppose hrst that 
there exists a facial set F such that supp{q) C F. Let F be the minimal of such sets. As 
shown in the proof of (m), the MLE Sy^ under RMsp{Af) exists, and, from ([H]), 


AFSyp = qAi^q^, for some 7 > 0 , and, if <5 = p, I'Sy^ = 1 
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The MLE under RMs{A) is equal to Sy = {Sy^, Ox\f)- As Sy^F,i = 0 for i ^ F, Ady = jAq, 
and, in the case of probabilities, i'Sy^ = 1. 

If, for all facial sets F, supp{q) ^ F, then the MLE dy under the extended model exists 
and is also the MLE under RMs{A). In this case, (0) holds and is the same as flTSll . which 
completes the proof. □ 
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